Perform a random sub-selection from a total population
The example below demonstrates how to extract random samples from a total population/larger population. This is done using the sample
command.
The first input parameter specifies the size of the extract. If this is a decimal number (0.0-1.0), a share is deducted. If you want a random selection of a specific number of individuals, enter an integer > 1000.
The last input parameter is a self-defined positive integer that you are free to choose yourself, i.e. a seed number. This ensures that the individuals in the sample you draw will be identical in subsequent runs. If you want to draw several selections consisting of new people for each draw, a new seed number is selected for each time the command is run.
//Connect to datastore
require no.ssb.fdb:23 as db
//Create dataset with all residents in Norway as of 1/1 2021 and then extract a 10% sample from these
create-dataset totalpop
import db/BEFOLKNING_STATUSKODE 2021-01-01 as registerstatus21
keep if registerstatus21 == '1'
sample 0.1 999
//Create dataset with all residents in Norway as of 1/1 2021 and then draw a sample consisting of 5,000 individuals
create-dataset totalpop2
import db/BEFOLKNING_STATUSKODE 2021-01-01 as registerstatus21
keep if registerstatus21 == '1'
sample 5000 888
//Create a dataset with all residents in Norway as of 1/1 2021 and then draw a sample consisting of 5,000 new individuals (different from the previous selection)
create-dataset totalpop3
import db/BEFOLKNING_STATUSKODE 2021-01-01 as registerstatus21
keep if registerstatus21 == '1'
sample 5000 950